An Empirical Comparison of Text Categorization Methods
نویسندگان
چکیده
In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models. We report the results obtained using the Mean Reciprocal Rank as a measure of overall performance, a commonly used evaluation measure for question answering tasks. We argue that this evaluation measure is also very well suited for text categorization tasks. Our results show that overall, SVMs and k-NN LSA perform better than the other methods, in a statistically significant way.
منابع مشابه
A Comparison of Two Learning Algorithms for TextCategorization (
This paper examines the use of inductive learning to categorize natural language documents into predeened content categories. Categorization of text is of increasing importance in information retrieval and natural language processing systems. Previous research on automated text categorization has mixed machine learning and knowledge engineering methods, making it diicult to draw conclusions abo...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملA Comparison of Text Categorization Methods
In this paper firstly I have compared Single Label Text Categorization with Multi Label Text Categorization in detail then I have compared Document Pivoted Categorization with Category Pivoted Categorization in detail. For this purpose I have given the general definition of Text Categorization with its mathematical notation for the purpose of its frugality and cost effectiveness. Then with the ...
متن کاملAn Empirical Study of Category Skew on Feature Selection for Text Categorization
In this paper, we present an empirical comparison of the effects of category skew on six feature selection methods. The methods were evaluated on 36 datasets generated from the 20 Newsgroups, OHSUMED, and Reuters-21578 text corpora. The datasets were generated to possess particular category skew characteristics (i.e., the number of documents assigned to each category). Our objective was to dete...
متن کاملSolving multi-label text categorization problem using support vector machine approach with membership function
The pervasiveness of information available on the internet means that increasing numbers of documents must be classified. Text categorization is not only undertaken by domain experts, but also by automatic text categorization systems. Therefore, a text categorization system with a multi-label classifier is necessary to process the large number of documents. documents. Data mapping is performed ...
متن کامل